AITopics | please describe

Collaborating Authors

please describe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models

Neural Information Processing SystemsJun-22-2026, 12:07:08 GMT

Large vision-language models (LVLMs), which integrate a vision encoder (VE) with a large language model, have achieved remarkable success across various tasks. However, there are still crucial challenges in LVLMs such as object hallucination, generating descriptions of objects that are not in the input image. Here, we argue that uncertain visual tokens within the VE is a key factor that contributes to object hallucination. Our statistical analysis found that there are positive correlations between visual tokens with high epistemic uncertainty and the occurrence of hallucinations. Furthermore, we show theoretically and empirically that visual tokens in early VE layers that exhibit large representation deviations under small adversarial perturbations indicate high epistemic uncertainty. Based on these findings, we propose a simple yet effective strategy to mitigate object hallucination by modifying the VE only. Our method comprises a proxy method with adversarial perturbations for identifying uncertain visual tokens efficiently and a method to mask these uncertain visual tokens during the self-attention process in the middle layers of the VE, suppressing their influence on visual encoding and thus alleviating hallucinations. Extensive experiments show that our method significantly reduces object hallucinations in LVLMs and can synergistically work with other prior arts.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Passenger (0.67)
Leisure & Entertainment (0.67)
Transportation > Ground > Road (0.46)
Transportation > Ground > Rail (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WolBanking77: Wolof Banking Speech Intent Classification Dataset

Neural Information Processing SystemsJun-22-2026, 07:34:33 GMT

Intent classification models have made a significant progress in recent years. However, previous studies primarily focus on high-resource language datasets, which results in a gap for low-resource languages and for regions with high rates of illiteracy, where languages are more spoken than read or written. This is the case in Senegal, for example, where Wolof is spoken by around 90% of the population, while the national illiteracy rate remains at of 42%. Wolof is actually spoken by more than 10 million people in West African region. To address these limitations, we introduce the Wolof Banking Speech Intent Classification Dataset (WolBanking77), for academic research in intent classification.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country:

Africa > Senegal (1.00)
Asia (0.93)
Europe > France > Provence-Alpes-Côte d'Azur (0.28)
North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Law (1.00)
Education (1.00)
Banking & Finance (1.00)
(4 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(5 more...)

Add feedback

9d411e87d0f37059f40fb27c5de00ba0-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-20-2026, 15:52:12 GMT

The following section is answers to questions listed in datasheets for datasets.858 A.1 Motivation859 Question: For what purpose was the dataset created? Was there a specific task in mind?860 Was there a specific gap that needed to be filled? Answer: To evaluate the linguistic robustness of language models across diverse English862 varieties by transforming Standard American English (SAE) datasets.863 Question: Who created the dataset (e.g., which team, research group) and on behalf of864 which entity (e.g., company, institution, organization)?865 Answer: The authors of this paper.866 Question: Who funded the creation of the dataset? If there is an associated grant, please867 provide the name of the grantor and the grant name and number.868

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Genre: Research Report (0.34)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Add feedback

SentinelKilnDB: ALarge-Scale Dataset and Benchmark for OBBBrick Kiln Detection in South Asia Using Satellite Imagery Supplementary Information

Neural Information Processing SystemsJun-20-2026, 11:45:56 GMT

The questions are presented in blue, with our corresponding responses shown in black. For what purpose was the dataset created? Was there a specific task in mind? This dataset was created for academic and research purposes to advance scientific understanding and support policy development on air quality and sustainability issues. The findings highlight important opportunities to improve regulatory compliance and encourage the adoption of cleaner technologies within the brick kiln sector, which is a significant contributor to regional air pollution. Beyond its environmental relevance, this dataset is especially valuable for the fields of object detection and computer vision. It provides a large-scale, hand-validated collection of brick kiln locations annotated with oriented bounding boxes (OBBs) on freely available Sentinel-2 satellite imagery.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > India (1.00)

Industry:

Law (1.00)
Government (0.88)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.71)

Technology:

Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

74512f3c5831ea3fbd04984cb1a2d29d-Paper-Conference.pdf

Neural Information Processing SystemsJun-18-2026, 12:31:45 GMT

Select all applicable actions that occur and output None if not observedQwen2.5-VL:

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification

Neural Information Processing SystemsJun-18-2026, 12:08:23 GMT

Image classifiers are information-discarding machines, by design. Yet, how these models discard information remains mysterious. We hypothesize that one way for image classifiers to reach high accuracy is to zoom to the most discriminative region in the image and then extract features from there to predict image labels, discarding the rest of the image. Studying six popular networks ranging from AlexNet to CLIP, we find that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images. Furthermore, we uncover positional biases in various datasets, especially a strong center bias in two popular datasets: ImageNet-A and ObjectNet. Finally, leveraging our insights into the potential of zooming, we propose a test-time augmentation (TTA) technique that improves classification accuracy by forcing models to explicitly perform zoom-in operations before making predictions. Our method is more interpretable, accurate, and faster than MEMO, a state-of-the-art (SOTA) TTA method. We introduce ImageNet-Hard, a new benchmark that challenges SOTA classifiers including large vision-language models even when optimal zooming is allowed.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government (0.92)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

3e5b0db387078ac4968fd536d3c3a019-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-16-2026, 16:56:03 GMT

For models trained for multi-image input, text prompt is:850 Which objects are present in both images? You can think of your answer in any way (e.g. For models where we first concatenate the input images, the text prompt is:855 There are two images provided, one on the left and the other on the right.856 Which objects are present in both images? You can think of your answer in any way (e.g. We used the following procedure to guide our creation of images.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Industry:

Law (0.68)
Government (0.47)

Technology:

Information Technology > Communications (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

1ec408df112bc9b186d7b8fe0ada902a-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-1-2026, 02:17:57 GMT

artificial intelligence, dataset, machine learning, (19 more...)

Neural Information Processing Systems

Industry:

Law (0.95)
Government (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

StoryBench: AMultifaceted Benchmark for Continuous Story Visualization

Neural Information Processing SystemsApr-30-2026, 08:10:03 GMT

Generating video stories from text prompts is a complex task. In addition to having high visual quality, videos need to realistically adhere to a sequence of text prompts whilst being consistent throughout the frames. Creating a benchmark for video generation requires data annotated over time, which contrasts with the single caption used often in video datasets. To fill this gap, we collect comprehensive human annotations on three existing datasets, and introduce StoryBench: a new, challenging multi-task benchmark to reliably evaluate forthcoming text-to-video models. Our benchmark includes three video generation tasks of increasing difficulty: action execution, where the next action must be generated starting from a conditioning video; story continuation, where a sequence of actions must be executed starting from a conditioning video; and story generation, where a video must be generated from only text prompts. We evaluate small yet strong text-to-video baselines, and show the benefits of training on story-like data algorithmically generated from existing video captions. Finally, we establish guidelines for human evaluation of video stories, and reaffirm the need of better automatic metrics for video generation. StoryBench aims at encouraging future research efforts in this exciting new area. Work completed during an internship at Google.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: